74 research outputs found

    Phronesis and Automated Science: The Case of Machine Learning and Biology

    Get PDF
    The applications of machine learning (ML) and deep learning to the natural sciences has fostered the idea that the automated nature of algorithmic analysis will gradually dispense human beings from scientific work. In this paper, I will show that this view is problematic, at least when ML is applied to biology. In particular, I will claim that ML is not independent of human beings and cannot form the basis of automated science. Computer scientists conceive their work as being a case of Aristotle’s poiesis perfected by techne, which can be reduced to a number of straightforward rules and technical knowledge. I will show a number of concrete cases where at each level of computational analysis, more is required to ML than just poiesis and techne, and that the work of ML practitioners in biology needs also the cultivation of something analogous to phronesis, which cannot be automated. But even if we knew how to frame phronesis into rules (which is inconsistent with its own definition), still this virtue is deeply entrenched in our biological constitution, which computers lack. Whether computers can fully perform scientific practice (which is the result of the way we are cognitively and biologically) independently of humans (and their cognitive and biological specificities) is an ill-posed question

    Streaming histogram sketching for rapid microbiome analytics

    Get PDF
    Background: The growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research, allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time. To address this need, we propose a new method for the compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra. These sketches allow for dissimilarity estimation, rapid microbiome catalogue searching and classification of microbiome samples in near real time. Results: We apply streaming histogram sketching to microbiome samples as a form of dimensionality reduction, creating a compressed ‘histosketch’ that can efficiently represent microbiome k-mer spectra. Using public microbiome datasets, we show that histosketches can be clustered by sample type using the pairwise Jaccard similarity estimation, consequently allowing for rapid microbiome similarity searches via a locality sensitive hashing indexing scheme. Furthermore, we use a ‘real life’ example to show that histosketches can train machine learning classifiers to accurately label microbiome samples. Specifically, using a collection of 108 novel microbiome samples from a cohort of premature neonates, we trained and tested a random forest classifier that could accurately predict whether the neonate had received antibiotic treatment (97% accuracy, 96% precision) and could subsequently be used to classify microbiome data streams in less than 3 s. Conclusions: Our method offers a new approach to rapidly process microbiome data streams, allowing samples to be rapidly clustered, indexed and classified. We also provide our implementation, Histosketching Using Little K-mers (HULK), which can histosketch a typical 2 GB microbiome in 50 s on a standard laptop using four cores, with the sketch occupying 3000 bytes of disk space

    Linked read technology for assembling large complex and polyploid genomes

    Get PDF
    Background: Short read DNA sequencing technologies have revolutionized genome assembly by providing high accuracy and throughput data at low cost. But it remains challenging to assemble short read data, particularly for large, complex and polyploid genomes. The linked read strategy has the potential to enhance the value of short reads for genome assembly because all reads originating from a single long molecule of DNA share a common barcode. However, the majority of studies to date that have employed linked reads were focused on human haplotype phasing and genome assembly. Results: Here we describe a de novo maize B73 genome assembly generated via linked read technology which contains ~ 172,000 scaffolds with an N50 of 89 kb that cover 50% of the genome. Based on comparisons to the B73 reference genome, 91% of linked read contigs are accurately assembled. Because it was possible to identify errors with \u3e 76% accuracy using machine learning, it may be possible to identify and potentially correct systematic errors. Complex polyploids represent one of the last grand challenges in genome assembly. Linked read technology was able to successfully resolve the two subgenomes of the recent allopolyploid, proso millet (Panicum miliaceum). Our assembly covers ~ 83% of the 1 Gb genome and consists of 30,819 scaffolds with an N50 of 912 kb. Conclusions: Our analysis provides a framework for future de novo genome assemblies using linked reads, and we suggest computational strategies that if implemented have the potential to further improve linked read assemblies, particularly for repetitive genomes

    Asteroseismology and Interferometry

    Get PDF
    Asteroseismology provides us with a unique opportunity to improve our understanding of stellar structure and evolution. Recent developments, including the first systematic studies of solar-like pulsators, have boosted the impact of this field of research within Astrophysics and have led to a significant increase in the size of the research community. In the present paper we start by reviewing the basic observational and theoretical properties of classical and solar-like pulsators and present results from some of the most recent and outstanding studies of these stars. We centre our review on those classes of pulsators for which interferometric studies are expected to provide a significant input. We discuss current limitations to asteroseismic studies, including difficulties in mode identification and in the accurate determination of global parameters of pulsating stars, and, after a brief review of those aspects of interferometry that are most relevant in this context, anticipate how interferometric observations may contribute to overcome these limitations. Moreover, we present results of recent pilot studies of pulsating stars involving both asteroseismic and interferometric constraints and look into the future, summarizing ongoing efforts concerning the development of future instruments and satellite missions which are expected to have an impact in this field of research.Comment: Version as published in The Astronomy and Astrophysics Review, Volume 14, Issue 3-4, pp. 217-36

    Search for Gravitational Waves from Primordial Black Hole Binary Coalescences in the Galactic Halo

    Get PDF
    We use data from the second science run of the LIGO gravitational-wave detectors to search for the gravitational waves from primordial black hole (PBH) binary coalescence with component masses in the range 0.2--1.0M1.0 M_\odot. The analysis requires a signal to be found in the data from both LIGO observatories, according to a set of coincidence criteria. No inspiral signals were found. Assuming a spherical halo with core radius 5 kpc extending to 50 kpc containing non-spinning black holes with masses in the range 0.2--1.0M1.0 M_\odot, we place an observational upper limit on the rate of PBH coalescence of 63 per year per Milky Way halo (MWH) with 90% confidence.Comment: 7 pages, 4 figures, to be submitted to Phys. Rev.

    CD133-positive hepatocellular carcinoma in an area endemic for hepatitis B virus infection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>CD133 was detected in several types of cancers including hepatocellular carcinoma (HCC), which raised the possibility of stem cell origin in a subset of cancers. However, reappearance of embryonic markers in de-differentiated malignant cells was commonly observed. It remained to be elucidated whether CD133-positive HCCs were indeed of stem cell origin or they were just a group of poorly differentiated cells acquiring an embryonic marker. The aim of this study was to investigate the significance of CD133 expression in HCC in an area endemic for hepatitis B virus (HBV) infection to gain insights on this issue.</p> <p>Methods</p> <p>154 HCC patients receiving total removal of HCCs were included. 104 of them (67.5%) were positive for HBV infection. The cancerous and adjacent non-cancerous liver tissues were subjected for Western blot and immunohistochemistry analysis for CD133 expression. The data were correlated with clinical parameters, patient survivals, and p53 expression.</p> <p>Results</p> <p>Of 154 patients, 24 (15.6%) had CD133 expression in HCC. Univariate and multivariate logistic regression analysis revealed that CD133 expression was negatively correlated with the presence of hepatitis B surface antigen (HBsAg). The unadjusted and adjusted odds ratios were 0.337 (95%CI 0.126 - 0.890) and 0.084 (95%CI 0.010 - 0.707), respectively. On the other hand, p53 expression was positively associated with the presence of HBsAg in univariate analysis. The unadjusted odds ratio was 4.203 (95%CI 1.110 - 18.673). Survival analysis indicated that both CD133 and p53 expression in HCC predicted poor disease-free survival (P = 0.009 and 0.001, respectively), whereas only CD133 expression predicted poor overall survival (P = 0.001). Cox proportional hazard model showed that p53 and CD133 expression were two independent predictors for disease-free survival. The hazard ratios were 1.697 (95% CI 1.318 - 2.185) and 2.559 (95% CI 1.519 - 4.313), respectively (P < 0.001 for both).</p> <p>Conclusion</p> <p>In area where HBV infection accounts for the major attributive risk of HCC, CD133 expression in HCC was negatively associated with the presence of HBsAg, implicating a non-viral origin of CD133-positive HCC. Additionally, CD133 expression predicted poor disease-free survival independently of p53 expression, arguing for two distinguishable hepatocarcinogenesis pathways.</p

    Prediction and characterization of human ageing-related proteins by using machine learning

    Get PDF
    Abstract Ageing has a huge impact on human health and economy, but its molecular basis – regulation and mechanism – is still poorly understood. By today, more than three hundred genes (almost all of them function as protein-coding genes) have been related to human ageing. Although individual ageing-related genes or some small subsets of these genes have been intensively studied, their analysis as a whole has been highly limited. To fill this gap, for each human protein we extracted 21000 protein features from various databases, and using these data as an input to state-of-the-art machine learning methods, we classified human proteins as ageing-related or non-ageing-related. We found a simple classification model based on only 36 protein features, such as the “number of ageing-related interaction partners”, “response to oxidative stress”, “damaged DNA binding”, “rhythmic process” and “extracellular region”. Predicted values of the model quantify the relevance of a given protein in the regulation or mechanisms of the human ageing process. Furthermore, we identified new candidate proteins having strong computational evidence of their important role in ageing. Some of them, like Cytochrome b-245 light chain (CY24A) and Endoribonuclease ZC3H12A (ZC12A) have no previous ageing-associated annotations
    corecore